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ABSTRACT 

There exists a variety of general computational methods (and variances) for 
ill-posed problems such as geophysical inverse problems. These have sig- 
nificant differences in approach and interpretation based on varying assump- 
tions as to, c.g., the nature of measurement uncertainties, This paper addresses 
the following points: //o>v are the various approaches related? What consider- 
ations should be kept in mind in selecting an approach? To what extent can 
one confidently rely on the results of such computation? 
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GEOPHYSICAL APPROACHES TO INVERSE PROBLEMS: 
A METHODOLOGICAL COMPARISON 
PART I -A POSTIiR/ORJ APPROACH 


I. INTRODUCTION 

Most scientific activity can be categorized under tlic three heads; model construction, inference, 
prediction,^ Most models, as constructed, involve parameters wliicli must then be experimentally 
determined; the elements of the relevant parameter space (required to specify the presumed physical 
reality witiiin the general structural framework) consist of a finite or infinite set of numbers and/or 
functions. In geophysical problems these parameters typically do include specification of functions^ 
so that the parameter space is infinite dimensional’ . 

Given a specification of tiic model parameters, the so-called “direct problem” is that of predict* 
ing the results of certain observations or measurements. The “inverse problem,” with which this 
paper is concerned, is that of making inferences (about the parameter values) from available observa- 
tional data. This consists in delineating"’ the set of those parameter values for which the predicted 
results are consistent with the given data. Much of our present discussion relates to the possible 
interpretations of this notion of “consistency with the data” and tlie methodological Implications of 
these possibilities for study of the inverse problem. 

We also make a fundamental distinction between a priori and a posteriori approaclies. An a priori 
approach, here, is one which promises an arbitrarily good approximation to the solution if it can be 
furnished “sufficiently adequate and accurate data” while an a posteriori approach is one which 
furnishes an interpretation of a set of data already obtained. Perhaps the most widely acknowledged 
general approach to geophysical inverse problems is that of Backus and Gilbert ([3] -[9] ; see also, 
e.g., [ 18] -[21 ] ) which, in terms of this distinction, is to be classified as primarily an a posteriori 
approach. In view of the nature of the organization and funding of geophysical investigation, it will 
necessarily be approaches of tiiis sort which can typically be of direct practical relevance. On the 
other hand, we shall argue that the viewpoint of tha a priori approaches is of fundamental importance 


in understanding the underlying nature of ilI*poscdness and so in understanding approach to 
dealing with it. Most of the relevant inatheniatical literature is of this nature; see, for example, Payne 
[221 andTikhonov/Arsenin [29] and the further references cited in their bibliographies (T/A [1977] 
especially, has an excellent bibliographic coverage of the recent Soviet work in this area). 

2, AN ABSTRACT VIEW OF INVERSE PROBLEMS: A POSTER/OR/ CONSmERATlONS 
Consider a well-posed direct problem for which the data x can be taken ina space X with the 
corresponding solution y in a space Y. The assumed well-posedness of this direct problem means 
that there is a well-defined continuous map® A:x -»• y. The range R = { y : y = A x for some x in X | 
is then the set of y in Y for which the equation 

Ax = y (2.1) 


has a solution. 

The inverse problem, then, is to “solve” (2.1) for x. What do we mean by this? From an a 
posteriori viewpoint one assumes that what may actually be available is an “observation of y.” 

Such an observation consists of the results of a (finite) set of measurements. In general, these will 
be neither adequate nor exact. 

The inadequacy of the available observation means only that even if there were no measure- 
ment error, the set of measurements made would not suffice to specify y exactly. This considera- 
tion does not materially affect the present analysis of the implications of inexact measurement 
and need not be treated separately. 

The inexactitude or measurement uncertainty reflects the limitations of the physical measuring 
and recording instruments used in the observation. Thus, the nature of the assumptions made about 
it cannot be viewed as entirely subject to mathematical convenience but should properly be the 
result of a suitable analysis® of the measuring process. At this point one can distinguish two quite 
different assumptions as to the nature of the inexacti tude: either’ 
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The measurements come with specifiable error bounds. 


(2.2a) 


or 

A statistical distribution (i.e,, a joint probability distribution for the measurement 
components) is available for the errors. 

These types lead to different modes of analysis and rather different interpretations® of the results; 
we will proceed to sketch both types of analysis. 

For analysis, we assume First that the available data set \s finite^ — the results of IC scalar 
measurements so an “observation of y” is a vector co in We write x for the true (desired) 
solution soy^ : = Ax_|^ is the true (observable) If the taking of exact, errorTree measurements 
is denoted by, SI (this is thus an operator : Y “>■ then w . S2y is the (.true observation or 
ideal data set; the (vector) error is E := w-co^ in 

Our problem, then, is to estimate x^ (or make inferences about it) on the basis of knowledge 
of to and analysis of the operators"^ A and SI, Actually, even though the nominal domain D of A 
may be all of X (see note® ), one typically has additional qualitative information about on 
“physical grounds” x^ may be known, e.g., to be everywhere non-negative or to be a strictly 
increasing function or to be “smootlier” (as a function) than an arbitrary element of X need be. 
Such additional information can often be taken into consideration by suitable modification 
(restriction) of the domain* * —say, replacing D by D as the effective domain. 

We now discuss— briefly— some approaches available for inverse inference under each of the 
“uncertainty assumptions” (2.2a,b), For simplicity we concentrate our attention on problems in 
which X is a {Hilbert space and A, SI (hence, also the composed map Aj^ := A:X -> ) are 

linear^ ^ . 

jr 13 

Case (2,2a)\ We take tlie assumption (2.2a) to mean that a (small) set B in is specified in 
which the error E is known to lie. In the absence of other informaton, all one can then conclude is 
that X. lies in the set of “potential solutions” 

S(<i>) := jxeX : [oj-A^x] e B] . (2.3) 
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Wc emphasize that a// elements of S(o)) are (equally consistent with the observation cj and that 
only the introduction of additional considerations can in any way suggest the selection of any one 
element of S(w) as being a “better” solution than any other. One might hope to select in S(cj) 
for which the error bound 


II - x,^ II < sup { II X - Xjj II : X e S(w)| (2.4) 

would give useful information-but, for the problems under consideration, S(co) will always be 
unbounded so the riglU hand side of (2.4) would, uselessly, be infinite. 

On the other hand, given auxiliary information determining an effective domain D the set of 
potential solutions becomes 

S(w) { xe D ; [CO - x] eB } = S(to) n 0 (2,3‘) 

(for the case: S(co) empty, see note'’ ). One may now wisli to select Xj^ so (2.4) becomes 

llx^- x,. il< sup { II X -Xf. II : X e S(co)| (2.4') 

with the hope that the right hand side of (2.4’) might be usefully small.* 

Since co, as given, is a point of the K-dimensional space*® , it is plausible to expect to be 
able to use this observation to determine K scalar parameters specifying a “nominal solution” Xj^ 
to be selected from a suitably chosen K-dimensional subspace X of X, i.e,, to seek Xj^ as an estimate 
(approximation) for x^. We first consider the selection method; 

X,. m X, Xj^ = CO (2.5) 

(any reasonable choice of X makes the restriction A of A,^ := A to X invertible). Equivalently, 

X[^ := A* CO. (2.5') 
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Note that this selection of Xj. need not be in S(w). Now let 0 be the angle** between X and 
tj(Aj. ), let P be the orthogonal projection on and 

u := II A’‘ II = I /inf { im A X & : X e ^ II } . 

An easy geometric argument then shows that 

< llx, -X P + [ llx,-X IlcotO + u ilE II 1^ 

Similarly, we may follow Backus-Gilbert [1968] in considering not the approximate determination 
of x^ by X,. but the approximate determination of $ * x^ by | ’ X^ (wliere ? is a linear 
functional,*’ normalized so II ^ II = 1). Letting ^p be the angle between ^ and the null space of 
A^ , an argument much like the derivation of (2.7) gives the error bound 

I ^ ‘ x^ - S ’ Xj^ I II x^ - X II CSC 0 cos + u II E II , (2,8) 

We note that control of the factor cos ip in (2.8) is independent of the choice of X and is precisely 
the effect of the “criterion of 5-ness” applied to Aj. in Backus-Gilbert [1968] , 

Minimization of the factors cot 0 in (2.7) and esc 0 in (2.8) is achieved by choosing X to be 
fi(A|; ), the range of Aj., as is done in Seidman [1975] . There, this choice of X appeared as the 
result of a Variational fonmlation, clioosing x,, so as to minimize II x II subject to the constraint 
A^ X = CO, More generally (compare, e.g., Seidman [1978a] ), one might replace (2.5) by the 
constrained variational method; 

x^ giving min { II x - x II x e S(co) | . (2.9) 

where x is some plausibly guessed approximation to x^. If one temporarily ignores any restriction 
to D and if B is the ball* ® in of radius /J, then (for 0 small enough) it is easy to see that (2,9) is 
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cquivillent to tlie uiiconstratnccl variational nietliod: 


niin I II X -X IP + Ico-A^x IP ;xeX} (2,9') 

for some suitable cliolce' ® of the parameter a. j'iic metliod (2,9’) is, of course, the well-knov/n 
Tikhonov resniarlzatlon approach to ill-posed problems*®' * Note that (2.9’) gives (Xj^ - x) in 
X=if£(A,*) and so can be handled K-dimcnsionally. In the case of a Kuclidcan norm for (2.9') 
is equivalent to a linear system in X (the regularized normal equations', cf , e.g,, Washed [197!]): 

(a + Q) (x^ * X ) = A* (w - A^ X ) (2. 1 0) 

where Q ;= A * A . This leads to the error bound** : 

II x^ - X.. IP = II (a + Qr* [a P (x^ - x ) + A^ E] IP + II (I-P) (x^ - x ) IP 

( 2 . 11 ) 

^ ll(x^-x)- fl (A^) IP + lo- llPx^-Px 11+ IIA* E 111*. 

Case (2.2b): The probabilistjc cuaiysis assumes a probability distribution P^ over for the 
measurement error** E ;= to -to^_; for simplicity we adopt the usual normalifv assumption*'* —that 
Pg is a multivariate normal disiribution with mean 0 and covariance matrix cr*I (corresponding to 
the assumption that the means in the individual measurements are independent with common 
variance o* , perhaps after scaling). One can then proceed in either of two ways, corresponding to 
the standard statistical procedures of hypothesis testing (establishing “confidence intervals”’) and 
of parameter estimation. We consider only the latter. 

The construction of given above by (2.9') may be viewed as defining an estimator witl: the 
choices of x and a now corresponding to an essentially Bayesian attitude toward prior 
information** . Note that choice of a large means a small variance for the estimator but its mean, 
which depends on the variance for the estimator but its mean, which depends on the earlier estimate 
X, reflects less of the effect of the new observation co. Comparably to (2.1 1), consider the expecta- 
tion*® ofllx^-Xj, IP based on this Pj.: 
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Exp I ¥ * II Oc^ - X ) - X II* + II (a + QV a P(x^ - x ) «• + a* tr (Q(a + Q)”* 1 


<ll(x -x)-;^ II* +/__5iiL_Y II Px -Px K* + — Exp l!E IP. 
a \aj^* + 1/ * 4a 

3, DISCUSSION 

Suminarixing the above we see that the data set oi can provide diieet information only of the 
projection of x^ along t?(A,.) onto (R(A^). Any inference beyond tins must involve the restriction 
to a “small’’ set D by virtue of a priori auxiliary information. The real value of such an a posierlori 
analysis is often just the possibility alluded to in voto” of rejecting a hypothetical x (or a suggested 
value for a functional) as “inconsistent with the observation” co if x is not in S(co), In the proba- 
bilistic case, the normality assumption implies that, regardless of what may be, ;io to is entirely 
impossible. Given to, one cannot, thcOi wUh absolute certainty rule out any x but can only reject 
the iiypothesis that liiis x is, indeed, x. on tiie grounds tiiat this would require belief in the occur- 
rcnce of an event so unlikely (i.e., II E II so large) as to constitute a “miracle.” 

in eithfci* deterministic or probabilistic settings, the method (2.9') is a useful procedure for 
estimating k.^, on the basis of an earlier estimate x together witli a new set of observational data co. 
From Che bounds (2.1 1), (2.12) we condude that increase in K will presumably decrease tlie first 
term (as X = fl(A^ ) increases) but II Px^ - P x 11 and u will increase** so a must decrease if the second 
term is to decrease whence, finally, one must require greater accuracy of measurement*® to make 
the last tenn decrease as well. Note that while the bounds (2.11), (2.12) arc correct, they are 
strongly dependent for their precise form on tlic specific hypotheses made and have been presented 
here only as suggestive. Along the lines of c.g., Backus/Gilbert [7-9] or Parker [18-20] , one could at 
tliis point elaborate on the expectation tJiat intrinsically nonlinear problems would qualitatively 
exhibit essentially the same bciiavior with the linear operator A of the analyses above now obtained 

via linearization-e.g., the derivative*® atx of the actual nonlinear map. 

The paragrapli above, of course, initiates consideration of (asymptotic) a priori analysis; 

viewing the procedure described in the preceding section as merely one a sequence of increasingly 
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more accurate (one hcpeii) approximations It sliould be clear from the above that, unless con* 
sidorable care is taken in matching one’s expectations and procedures to the increasingly poor 
conditioning of the computation, tin? iipproximation could actually get worsc^° rather than better. 

it is worth noting that the usual procedure -ad opted abovc-of using the data to construct an 
approximant (estimate) x^ is ojie which makes sense only in this asymptotic context. While we 
were able to obtain bounds (2.1 1), (2.12), the first term II (x„. - x) -fl(Aj) II will (unless one can 
somehow make effective use of a restriction^ ‘ to a '‘small’’ D) decrease arbitrarily slowly and, in 
any case, depends on the unknown x,„ so one cannot really know how good an approximation any 
p.articuiar computed Xj. may be. Asymptotically, however, one can ensure convergence to x,^ of 
the sequence of approximants by employing dm? which “in the limit”is both adequate and 
accuratc^^ ’iCA,, ) | o } and HE II o sufficiently rapidly as K »). 

4. SUMMARY 

The principal use of experiment lies in the possibility of rejecting a model. To the extent that 
the predicted observations are comparatively insensitive to (certain aspects oQ the model, one finds 
it difficult to make useful distinctions without imposing unrealistic requirements on the extent and 
accuracy of the observations. The role of “inverse theory’’ must lie primarily in making efficient 
use of what data may be available and in evaluating the range of models consistent with these rather 
than in selecting a single model, Neverthclc.ss it is possible to construct estimates of model param- 
eters (functions) from observations. The flexibility in choosing norms for measuring errors leads to 
considerable arbitrariness in constructing algorithr the primary criteria are that norms for 
measurement errors reasonably reflect the properties of the physical instruments and that norms for 
parameter functions reficcl the combination of theoretical assumptions and the uses to which the 

I 

results will be put. In general only strong n /;r/or/‘ assumptions (permitting advance restriction to a 
“small” set) will make possible useful explicit error estimates but, instead, procedures can be 
coinpa.'-eu on the basis of the asymptotic properties otapproxinmtion schemes in which they can 
be embedded^^. Other than that, procedures can appear quite different (e.g., the explicit pseudo- 
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solution (2.5) and the variational formulation (2.9’)) yet (cf,, note**) be quite closely related. In 
general, tlie relative computational convenience of procedures will strongly depend on the availa- 
bility of bases for H which make the matrl.\ representations of Q (as of A and A*) relatively sparse. 

NOTES 

‘ To tlicse one may add (e.g., in engineering contexts) a fourth category design— to the exten* 
that some parameters or structural elements are at our disposal, select these so that the predicted 
results are as desired (perhaps, also, optimizing the selection with respect to some specified 
criterion), 

* As e.g., the density distribution of the earth or the form of the nonlinearity in a problem of 
flow througii a porous medium. 

^ With a suitable choice of metric, this may often be conveniently represented as (a subset oQ 
some Hilbert 

Note that this includes the notion of verification or validation of a theory or model: if the 
“consistent parameters set” were void, one would have disproved the theory! 

® Tlic map A need not be defined for every x in X and wc let D be the set of x for which Ax 
is defined (as A is continuous wc always take D to be closed in X). Thus, it is really D which is the 
set of “admissible data” x and the true domain of A but we abuse notation slightly by continuing 
to write A:X-+Y. 

®The ccmplcxitios of such an analysis lead us to sacrifice some potential accuracy to iiermit 
estimating the measurement uncertainties in a “more standardized,” simplified form. Considerable 
mathematical work remains to be done to justify this procedure in terms of some notion of “robust- 
ness.” 


’This typology is simplifying but not complete: it would certwinly be possible, for example, 
to consider mixed types. 
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® Neither treatment of errors can be considered supcrlor-^n.^., in the simplest (scalar) setting it 
must clearly be a matter of “personal taste” to have a preference between a Gaussian error distri- 
bution with the standard deviation a= 10'^ as against an absolutely certain error bound of 10'^ . 

’ Some measurement modes do give a continuous record (e,g., a scismographic track) but 
information tlicoretic considerations (recording resolution, pen width, etc.) can be used to sliow 
equivalence of these with discrete (sampled data) observations. 

'“What is actually under consideration in this section (i.e., tor a posteriori data interpretation) 
is just the composed map S2A : X ^ In this context, from this viewpoint, any concern for the 
operator A as such (rather than as part of tlic analysis of S2A) is irrelevaiiit. Similarly, any direct 
involvement of the intermediate space Y-say, in first using co to construct an estimate y of the 
state y preparatory to attempting to invert A to approximate x by A”' -would be essentially 
misplaced. 

‘ ‘ Making effective use of such information is often the most significant problem in treating 
sucii situations. One special form of this is of particular importance. It can happen that the set D 
(taken as the effective domain of A tlirough the use of auxiliary information) is a compact subset of 
X. In tliis case, assuming A would be one-to-one (uniqueness in D of solutions of (1 .1)), a standard 
theorem in Topology assures us that after restricting its domain to D the operator A is actually 
continuously invertible so in this case the inverse problem is no longer ill-posed— provided computa- 
tionally effective use can be made of the compactness of D. The composed map J2A is unlikely to 
be one-to-one, of course, so the above does not apply directly; nevertheless, this compactness is 
exactly the consideration needed to obtain useful error estimates. This compactness typically 
follows from .smoothness assumptions on x,(. via the Kelich-Kondracliov Tlieorcms (cf., e.g., 

Adams). 

For example: the ’direct problem relating the gravitational field in the observable regions to 
the internal density distribution of the earth is linear. However, the typically measurement process 


10 


here obtains only the field strength so ^2 would not be linear. Indeed, the linearity assumption 
seems to exclude almost all problems of direct geophysical interest. Nevertheless, the discussion 
here is of more than merely tutorial significance but may also be relevant to consideration of intrin- 
sically nonlinear problems through linearization; compare the penultimate section of Parker 
[19771. 

A more general formulation would permit B to depend on , e.g., if it would be the 
relative magnitude of errors which could be controlled or estimated. In particular, this would be 
the case if a significant source of uncertainty in the result would be uncertainty in specification of 
the model; if (2,1) is used when the “true” operator is A*, then-even in tiie absence of any further 
error-one would have observed A^,x* while computing on the assumption that - Ax# so 
E = [A# - A] X#, Tiie nature of the modifications needed for the analysis in the more general case 
are conceptually clear but inordinately more complex computationally. Compare, e.g., Theorem 
4,5 of Seidman [1978a]. 

''*Thc substitution of (2.4') for 2.4) will not help if S(a>) is still unbounded or is bounded but 
too large. A fuller understanding of the reason depends on more detailed consideration of the 
asymptotic analysis (cf., Seidman ( 1978a] ) but we see that if D is compact (see note* * ) and B 
small enough then useful estimates may be available. 

In practice one may well use measurements embodying some redundancy (linear depend- 
ence, in the present context) so that the range of Aj. is actually a proper subspace R in In this 
case our first step might be to replace the observation to (which, due to measurement error, might 
not actually be in the subspace R) by, say, co— the nearest point in R to cu. In some sense, what we 
wish to do is to replace the set [cu + B] by the set [co + TJ] DR, which typically has the form 
[cj + B ] , and replace by R. Note that the use of B instead of B may correspond to a substan- 
tial decrease in the actual uncertainty: this would be the “payoff” for the redundancy. Fiaving 
replaced (R^ by R, we return to the original notation witli the assumption that Aj^ is “on*o.” 
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Tlie angle a between two subspaces U, V is defined by cos a ;= sup | u . v : ueU, veV , 

It u 1! = 1 = II V I } . Thus, this definition of 0 is equivalent to (tt/ 2 - 0) being tiui angle betv/ecn the 
two K'dimcnsional subspaces, X and the range of Note that the minimum value of taken 

over all possible n-tlimcnsional subspaces Xn, is called the iMVidth of the set D; this notion has been 

-\ 

extensively studied in certain contexts (cf., e.g., Jerome [ 1967]). 

Achi.ally taking ? tp be point evaluation, as Backus and Gilbert do, is, of course, possible 
only if X consists of smooth enough functions for such functionals to be continuous. In that case 
X will be a reproducing kernel Hilbert space (cf., e.g., Aronszajn [1950]) and it may be convenient to 
express some of our fonmilas a bit differently In terms of the reproducing kernel. Typical cases 
lead to the use of spline spaces; see, e.g., Schoenberg [1946] , Schumaker f 1980] , Greville [1969]. 

'“This IS actually the most reasonable possibility for B if one uses an appropriate norm in 
related to the nature of the measurement uncertainties. This appropriate norm will not, in general, 
be the usual euclidean norm for it'' but is more likely to take the form 

II CO lin := max ( I coW l/ttij : k = 1 , . . . , k } 

v/hereaj^ measures the (relative) uncertainty of measurement of the k-tii scalar component. 

'“Clearly the choice of a depends on the radius e of B (with o as o.^ o) but, except 
asymptotically (compare tiie considerations of our next section), the appropriate choice is hard to 
determine; cf., e.g,, Morozov [1967] , Craven &. Wahba [1977] we choose this point at which to 
note that the norm for X is also somewhat arbitrary (for example, it can always be chosen to make 
a fairly arbitrary X orthogonal to the niiHspacc of Xj, without changing the topology of X) and 
some study has been given to optimal choice of the X-norm for (2.9'); cf., Cullum [1979] . 

See Tikhonov [1963] and, for a more complete treatment and an extensive bibliograpiry 
(especially of the more recent Soviet literature— including over 20 specifically geophysical applica- 
tions, especially by Glasko and by Prilepko) see Tiklionov-Arsenin [1977]. 
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In the limiting case e = 0 + (corresponding to assumed error-free measurement : co = w*) 
one would tlieoretically be solving (2.5), corresponding to (2,9') with a ~ o. However, it is charac- 
teristic of such “inverse problems that A, altliough invertible, is quite badly conditioned even for 
moderate values of K. We note that regularization is a recommended approach (cf,, e.g., ) 

to solution of such badly conditioned systems in the finite-tlimensional case— so compiitatiomlty 
(2.5) and 2.9') arc not really very distinct. 

*^The bound (2.1 1) reduces to (2.7) with cot 0 = o on taking a- o. In general, (2.1 1) offers 
no improvement over (2.7)-the advantage oi' (2.9') is the greater computational stability provided 
by regularization for the ill-conditioned system (2,5), 

As in case (2.2a), our assumption is essentially that the uncertainty is wholly due to 
measurement uncertainty which is additive and uniform. Modifications of uniformity would also be 
possible here along the lines of note‘^ . Note that if Pj, were to have (small) bounded support B. 
then all the considerations would apply as well. 

See any elementary statistical text on hypothesis testing, especially for the multivariate case 
(e.g., Morrison [ 1 976] ), for the implications and justification (in terms of the Central Limit 
Theorem) of this assumption. 

Here x represents an estimate based on earlier observations and a will be large or small 
depending on one’s degree of confidence in the accuracy of >( (actually, in the relative accuracies of 
Px and of A"* co). In some contexts one could simply include the earlier observations with the new 
in a single step but (2.9') is a far simpler “updating procedure.” Further, the format of the earlier 
measurements might be quite different-e.g,, x might correspond to an earlier estimate of an interior 
density distribution for the Earth based on seismic data whereas the new observations may be 
gravimetric-and with a different type of error statistics; indeed, the old measurements may no 
longer be available at all. 


One can carry this analysis a step tiirthcr by viewing as itself a random variable (e.g., its 
randomness might derive from measurement errors in the earlier observations from which it had 
been obtained); assume x, E independent with P(x^ - x) having mean o and covariance matrix o* I 
(in X). Then the expectation in (2.12) must be modified by considering the distribution of x. It 
can be shown that this is minimized by taking a:=u^ I a * in (2.9') which gives, altogether, 

Exp II x,i, - Xj' IP = Exp II (x^, -“x) - XIP + K ~ — , (2.12') 

a’ p* + cr^ 

Similar analyses have been used (e.g., Franklin 1970) to estimate the optimal a with ?c := o but 
viewing the unknown x^, itself, as a random variable, rather than as strictly determinate. 

'^’This is an inescapable consequence of the ill-posedness of the problem: from (2.6), 
where Xj. is the K-th eigenvalue Q<i > \2 > . . .> o) of A,, A. For a priori analysis an asymptotic 
knowledge, of Xj , . . , can be useful in asymptoticaliy making an appropriate choice of a in (2.9'). 
This increase in v, often rapid with K, makes the computation ill-conditioned even though A is 
invertible (see note*' ) which dominates the computational aspects of the problem. 

* ® For the probabilistic case it is instructive to “fudge” the interpretation slightly and suppose 
the measurement operator were really : y-> with K' a multiple of K corresponding to repeti- 
tion of a “basic” measurement operator, if o’’ I were the covariance matrix in (R'^'for measure- 
ment errors, then defining JX by averaging the values of repeated measurements in fX gives 
(K/K')e^ I as the covariance matrix in for the errors in SX. More complicated analyses cover 
situations in which the actual observational data is “highly redundant” although not just repeti- 
tive but the general strategy remains that one makes a "K-dimensional estimate Xj. ” based on data 
reduction from an originally K' -dimensional observation (K' > K), using the variance-reducing 
property of averaging independent errors to produce a statistically more accurate “pseudo- 
observation.” 
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Note that A, the weak Gateaux derivative at x of F : X -> Y is defined by f • A h = 
d[5 ‘ F (x + th)]/dt IjBQ provided this determines A : x ->■ y as a linear map. This is effectively 
what one obtains (implicitly) if, c,g„ (2.9 ') is employed (approximately) for nonlinear problem- 
provided X is already a good estimate for x„,. 

A ease in point is the analysis in Seidman [(1978b)] (noncovergence) of the method of 
least squares. Consider selection of Xj. by Xj. giving min II y - A x II subject to x,. e X 
(dim X = K) (here it is assumed that y w A x* is observable in Y but one might also be considering 
this with II y - A X II replaced by II co - Ax II if the actual observation Xw were in fl*'* with 
K ’ > K; compare notc^® ), Even with entirely error-free observation and calculation, it can be 
shown that if X is viewed as one of a sequence expanding to “fill” X, then the correspondingly 
computed sequence Xj. of “approximants” need not coverage to x^ (one need not even have 
{ II Xjj 11} bounded as K -^ «) if “bad” choices of X may be made corresponding to peimitting i> 
to increase faster than necessary. 

Even with D compact, the estimates available are typically asymptotic— as for the n-widths 
mentioned in note'* . Of course, even for well-posed problems in numerical analysis the error 
estimates typically involve constants (e.g., bounds on higher derivatives) which are not known so 
one’s degree of confidence in the results in “asymptotic”— although “extrapolation” may permit 
some direct estimation: use partial information and compare the results to estimate accuracy (this 
also is the crux of Waliba’s “cross-validation” analysis of regularization when, in the probablistic 
case, the variance for measurement errors is assumed not to be known in advance). 

The vagueness of this assertion is respect to explicit estimates of the errors is, in part, 
closely related to the considerable arbitrariness involved in the choice of norms. Even restricting 
oneself as above to quadratic (Hilbert space) norms as measures of approximation, one is free to 
adopt any of a wide variety of equivalent or inequivalent norms (e.g., II x IF could be / x* (s) ds 
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or / X* (s) o) (s) ds with wcigliting function w > o or, incquivaicntly, / [x* (s) + I grad x (s) P ] ds 
or , . . The ciioicc made will certainly affect both the interpretation and the computational 
convenience of the method. Particularly convenient are settings for which there are welhstudied 
computational algorithms with the matrices (associated, c.g., with (2.10) taken with respect to an 
appropriately chosen basis) sparse and not too ill-conditioned; typically this might be the case of 
spline representations are used in connection with Sobolev norms. Compare note” . 

Presumably one would avoid a method for which the scheme would not be convergent 
(to X*) at all. Compare note . 
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